Abstractive text summarization is the task of generating a short and concise summary from important ideas/phrases of the text. Text summarization has many applicable benefits in production which include summarizing medical reports for faster analysis and medical actions, media monitoring and surveillance tracking, and data refinement and memory reduction.
There are multiple datasets for performing abstractive text summarization. The more relevant ones are:
The dataset we chose was XSum which consists of 226,711 news articles accompanied with a one-sentence summary. The articles are collected from BBC articles from 2010 up until 2017. For the sake of our project we will be using XSum dataset as it conveys better support in trasnfer learning.
Below are two examples of the desired input and output, where the output is a summarized version of the input.
The word2vec algorithm learns word connections from a huge corpus of text using a neural network model. Once trained, the model can detect synonyms and recommend new words for a sentence. As the name suggests, word2vec associates each different word with a specific vector. The semantic similarity between the words can then be approximated via certain mathematical functions applied on their vectors, such as cosine similarity between vectors.
Data augmentation is typically used to reduce overfitting by generating additional training data using some techniques applied on the original training data. Back translation is one of the data augmentation techniques used for NLP tasks which invloves translating some samples from the dataset into another languauge (e.g. from English to French) then translating it back to the original language. This will mostly generate a paraphrased sentence of the original one.
When it comes to the results, the paper ROUGE-2 score found was 24.56, however, the one generated from the pretrained transformer produced 1.934, this is due to reducing the training dataset size due to resources limitations. The first table shows the word2vec modification which as you can see is much lower. The data augmentation were very similar to the actual model.
In conclusion, we believe that if we were allocated the enough resources, interesting results would have been obtained. As such, For future work, we will attempt to keep the model running longer and perform hyperparameter tuning for the word embedding. We might also want to introduce a different type of dataset.
Conclusion and future work (including lessons learned and interesting findings
List all references here, the following are only examples